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(57) Abstract " 



■\n array of -oligonucleotides on a solid substrate is disclosed, which can be used for multiple purposes. Methods and reagents are 
•rovided for performinc genotvping to determine the identity or ration of allelic forms of a gene in a sample. A single base extension 
Drimcr :s coupled to a sequence identity code. During the primer extension reaction a distinctive label is incorporated which identifies the 
| l!|c | ic .- 3nn prcscm jn the sample. 'Hits permits multiple simultaneous analyses to be peitonned easily and efficiently. 
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BACKGROUND OF THE INVENTION 

Obtaining genotype information on thousands of polymorphic markers in a 
highly parallel fashion is becoming an increasingly important task in mapping disease 
loci, in identifying quantitative trait loci, in diagnosing tumor loss of heterozygosity, 

1 0 and in performing linkage studies. A currently available method, for simultaneously 
obtaining large numbers of polymorphic marker genotypes involves hybridization to 
allele specific probes on high density oligonucleotide arrays. In order lo practice the 
method, redundant sets of hybridization probes, typically twenty or more, are used to 
score each marker. A high degree of redundancy is required, however, to reduce the 

15 noise and achieve an acceptable level of accuracy. Even this level of redundancy is 
often insufficient to unambiguously score heterozygotes or to quantitatively determine 
allele frequency in a population. Thus, there is a need in the art for more reliable and 
better quantitative methods to identify genotypes at polymorphic markers. 
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SUMMARY OF THE INVENTION 

An arrav of oligonucleotide tags attached to a solid substrate is disclosed, along 
wuh locus-soecific tagged oligonucleotide*. The arrav and the locus-specific tagged 
oligonucleotides are particularly useful in genotyp.ng using single base extension 
5 reactions. When used together, the array and theiocus-specific tagged oligonuclectiaes 
serve us a "universal chip" system for use in genotvpmg. wherein by using different sets 
of locus-soecific tagged oligonucleotides the system can be tailored to any destred 
genotvpmg application. For example, it is an object of the present invention to provide 
a method to aid in determining a ratio of alleles at a polymorphic iocus. It is another 
0 object of the invention to prov.de a set oionmers tcr use in determining a ratio of 
nucieotide; oresent at a polymorphic iocus. 

Thus, in one embodiment the invention relates to an array comprising one or 
more oligonucleotide tags fixed to a solid substrate, wherein each oligonucleotide tag 
comprises a unique known arbitrary nucleotide sequence of sufficient length to 
5 hybridize to a locus-specific tagged oligonucleotide, wherein the locus-specific tagged 
oliaonucleotide has at its first end nucleotide sequence wh,ch hybridizes to. e.g., is 
complementary to, the arbitrary sequence of the oligonucleotide tag, and wherein the 
locus-specific tagged oligonucleotide has at a second end nucleotide sequence 
comDlementary to target polynucleotide sequence in a sample. 
20 In one embodiment, the invention reiates to a kit comprising an array compnsin 

' one or more oligonucleotide tags fixed to a solid substrate, wherein each 
oligonucleotide tag comprises a unique known arbitrary nucleotide sequence of 
sufficient length to hybridizeto a locus-specific tagged oligonucleotide, and one or 
more locus-specific tagged oligonucleotides, wherein each locus-specific tagged 
25 oligonucleotide has at us first (5 1 ) end nucleotide sequence which hybridizes to. e.g.. is 
• complementary to. the arbitrary sequence of a corresponding oligonucleotide tag on th 
array, and has at it's second (3') end nucleotide sequence complementary to target 
polynucleotide sequence in a sample. 
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' The mvention nr.her relates to a method of genotypmg a nucieic acid sample at 
one or more ioci. comprising :ne steos of obtaining a nucleic acid sample to be tested: 
combining the nucleic and samoie with one or more locus-specific tagged 
oligonucleotides under conditions suitable for hybridization of the nucleic acid sample 
5 to one or more iocus-spectiic tagged oligonucleotides, wherein each locus-specific 
taggeo oligonucleotide comprises a nucleotide sequence capable ot hybridizing to a 
complementary sequence m an oligonucleotide tag and a nucleotide sequence 
complementary to the nucleotide sequence 5' of a nucleotide to be queried in the' 
sample, thereby creating ait amplification product- locus- specific tagged oligonucleotide 
10 comolex: subjecting tne complex to a single base extension reaction, wherein the 
reaction results in the addition of a labeled ddNTP to the locus-specific tagged 
oligonucleotide, anc wherein each type of ddNTP has a label that can be distinguished 
from the label of the other three types of ddNTPs; contacting the complex with an 
oligonucleotide array comprising one or more oligonucleotide tags fixed to a solid 
15 substrate under suitable hybridization conditions, wherein each oligonucleotide tag 
comprises a unique arbitrary sequence complementary and of sufficient length to 
hybndize to a complementarysequence in a locus-specific tagged oligonucleotide, 
whereby the complex hybridizes to a specific oligonucleotide tag on the array; and 
assaying the array to determine the labeled ddNTPs present in the complex hybridized 
20 to one or more oligonucleotide tags, thereby determining the genotype of the queried 
nucleotide in the sampie. In one embodiment the nucleic aadsample to be tested is 
amplified. 

In one embodiment a methou is provided to aid in determining a ratio of alleles 
at a polymorphic locus in a sample. A pair of primers is used to amplify a region of a 
25 nucleic acid in a sample. In one embodiment, the region comprises a polymorphic 
locus, and an amplified nucleic acid product is formed which comprises the 
polymorphic locus. The amplified nucleic acid product is used as a template in a single 
base extension reaction with an extension primer, forming a labeled extension pnmer. 
The extension primer < also called a locus-specific tagged oligonucleotide herein'. 
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, 0 mt>rises a y oomon and a 5" ponion. The 3' ponion is complementary to the 
.-npUfied nucleic acid product and terminates one nucleotide 5' to tne ooiymoroh.c 
ocus The 5" oomon .s not complementary to the amplified nucleic acid prcduc:. A 
labeled diceoxvnucleotide which is complemeniary to the polymorpnic locus >s couplea 
o the v end of the extension pnmer. Each type of dideoxynucleoude present ,n the 
action bears a distinct label. The. 5' ponton of the extension pnmer is hybridized to 
one or more probes (also called oligonucleotide tags herein) which are immobilized to 
known locations on a solid support. The probes compnse a nucleotide sequence wh.cn 
is comDlementarv to tne 5' ponion of the extension pnmer. 

Also orovided by the present invention is a set of primers for use n cetermmmg 
a ratio of nucleotides present at a polymorphic locus. The set includes a pr.r of 
amplification pnmers and an extension pnmer. The pair of primers onme synthesis ot a 
r^on of double stranded nucleic acid which comprises a polymorphic locus. The 
extension onmer comprises a V ponion which is complementary to a ponion of the 
5 redon of double stranded nucleic acid and a 5' ponion which is not complementary to 
the region of double stranded nucleic acid. The extension pnmer terminates one 
nucleotide 5' to the polymorphic locus. Examples of pnmers according to the invention 

are shown in Table 1. 

Another embodiment of the invention provides a method to aid in determining a 

0 -atio of alleles at a polymorphic locus in a sample. .Any nucleic acid molecule. 

.ncluding genomic DNA. which comprises one or more polymorphic locus ,s used as a 
Template in a single base extension reaction with an extension pnmer. forming a labeled 
extension primer. The extension pnmer compnses a 3' ponion and a 5' ponion. The , 
aortion is complementary to the nucleic acid molecule and terminates one nucleotide , 

^ to the polvmorphic locus. The 5' ponion is not complementary to the nucietc acid 
molecule A labeled dideoxvnucleotide which is complementary to the polymorphic 
• ,ocus is couoled to the T end of the extension pnmer. Each type of dideoxvwleotice 
Present in the reaction bears a distinct label. The 5' pomon of the extension onmer is 
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hvbricizea to on 
suoDon 



. or more probes which are -mobilized to known locations on a solid 



These *nd other embodiments of the invention » hich are described m more 
cetati below provide the art ,,th me:hods and :oo.s tor rapidly and easily determine 
genotypes of individuals and allele frequencies in populations. 

3RJEF DESCRIPTION OF THE DRAWINGS 

Pig. ; is a diagram of the un.versai anay. The solid substrate ie.g.. a glass sude, 
■ s deoicted on the left, and different oligonucleotide tags r A". "B". "C". etc.) are shown 
■upcheci to the solid substrate. The nucleotide sequence on the nght-hand end o. eacn 
oiuonucleotide tag ("Tag A". Tag 3". "Tag C> is arbitrary unique sequence: that ,s. it 
is desimed and svnthesized to be unique to each oligonucleotide tag. 

~Fi* •> is a diasram depicting a locus-specific tagged oligonucleotide. The 
nucleotiJsequence at the left-hand end is complementary to the arbitrary sequence of 
one of the oligonucleotide tags depicted in Fig. 1. The nucleotide sequence at the ngnt- 
5 band end is complementary to the amplification product of a known polymorphic locus 
{e g a single nucleotide polymorphism (SNPM. Therefore, locus-specific tagged 
oligonucleotide "A" composes anucleotide sequence complementary to the arbitrary 
. sequence of the "Tag A" oligonucleotide tag depicted in Fig. 1. and also comprises 
sequence complementary to SNP "A". 
0 Fig. 3 is a diagram showing the hybridization of the locus-specific tagged 

oligonucleotide to the amplification product. The locus-specific sequence (right hand 
end) of the oligonucleotide is designed so that it terminates one nucleotide immed.ateiy 
before (5' of) the nucleotide to be genotyped (shown in box). 

Fie. 4 is a diagram depicting the labeling of the iocus-spec.tic tagged 
5 oiigonucle'otide-ampl.f.cation pnmer complex via single base extension. During the 
reaction, a smale labeled ddNTP complementary to the queried nucleotide is 
enzymatically added to the y end of the Iocs-specific -.agged ol.gonucieot.de. The 
nucleotide is shown in ihe box. 
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Fia. 5 is a diagram depicting the hybridization oi'ihe compiex ot the 
amplification product and the locus-specific tagged oligonucleotide to the 
cligonucieof.de tags on the array. The so. id substrate to which the oligonucleotide tags 
of -he array are bound is shown on the left, with the individual addresses labeled as "A". 
5. "3". etc. Each oligonucleotide tag is shown at its address. The locus-specific tagged 
oligonucleotide is shown hybridized to the oligonucleotide tag. and the amplification 
product is in rum bound to the locus-specific tagged oligonucleotide. The locus-specific 
ragged oligonucleotide is bound to a labeled (■. •. etc.) nucleotide as a result of single 
basc extension. Although a single complex is shown at each address, in reality, many 
: 0 iuch oligonucieot.de tags are located at each address: that is. the suostrate surface at 
address "A" has mar.y copies of oligonucleotide tag "A" attached to it. etc. 

Fia. o is a diagram depicting the hybridization as in Fig. 5. but the sample at 
address "B" is heterozygous for the queried nucleotide. 

Fig. 7 is a schematic showing the combined use of amplification, single base 
1 5 extension of a tagged primer, and hybridization to a tag array. 

Fie. S shows a quantitative measurement of allele frequency. Template-T 
(5'-TGCTG.AATATTCAGATTCTCTAGTGCTACCTGAAAGATCCTG-3 : ; SEQ ID 

NO: 1) and 
Template-G 

20 i 5"-TGCTGAATATTCAGATTCTCGAGTGCTACCTGAAAGATCCTG-3', SEQ ID 
NO: 2) were mixed at different ratios (6 ruM /60 ruM. 6 nM / l 8 ruVI. 6 nM /6 nM. 13 nM 
. 6 nM. 60 ruM ;6 nM. 1 80 nM ;6 nM). Six SBE primers 

o'-CACCATGCTCACAATGAATGCAGGATCTTTCAGGTAGCACT-3' (SEQ ID 
NO: 3); 

25 5 "-GATAATTCTCTG ATAGGCCGC AGGATCTTTCAGGTAGC ACT-3 (SEQ ID 
NO: 4); 

5-.GACTACGATGTGATCCGTGTCAGGATCTTTCAGGTAGCACT-3- (SEQ ID 
NO: 5): 
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5 * -G AACGC AGTTATC AGACTCTC AGGATCTTTC AGGTAGC ACT-3 ' TSEQ ID 
N r O: 6); 

i'-CGAGGACATGGAGTCACATCCAGGATCTTTCAGGTAGC-ACT-j" <SEQ ID 
MO: "?V, and 

5*-GCTAGGCATTCCTCCAGTGTCAGGATCTTTCAGGTAGCACT-3 > (SEQ ID 
NO: S)) were separately added to six 5BE reactions which contain the mixed templates 
of different ratios. The SBE pnmers were extended in the presence of biotin-labeled 
ddATP and tluorescein-labeled ddCT? ('see Examples) and pooled and hybridized to the 
laa array. The intensity ratio of the wo colors (the y-axis) were p-otted against the ratio 
0 of the mixed two templates (the x-axis). 

Fig. 9 shows a clustering analysis of the tag array hybridization results in 44 
individuals at marker GMP- 140.25. 

DETAILED DESCRIPTION OF THE INVENTION 

The invention features a generic or universal genotyping array, consisting of 

5 oligonucleotide tags attached to a solid substrate (Fig. 1). Each address in the array 
{e.g., "A". "B'\ "C", etc.) has an oligonucleotide tag associated with it. The 
oligonucleotide tag at a given address is attached to the solid substrate, and comprises a 
unique arbitrary nucleotide sequence. That is, the nucleotide sequence is unique for the 
oligonucleotide tag at each address, i.e.. the nucleotide sequence for "tag A" is different 

0 from the nucleotide sequence for all other tags in the array. The nucleotide sequence for 
each tag is arbitrary in that it can be any sequence, provided that it is different from the 
nucleotide sequence for every other tag in the array. Preferably the oligonucleotide tag 
is from about 20 to about 50 nucleotides in length. It may also be desirable to design 
the nucleotide sequence of the oligonucleotide tag such that it does not facilitate an 

5 undesirable interaction, e.g., with the target nucleic acid molecule (amplified product). 
The oligonucleotide array is used in conjunction with locus-specific tagged 
oligonucleotides. Each oligonucleotide tag in the array corresponds to a loeus-specme 
tagged oligonucleotide. One end (the 5' end) of the locus-specific tagged 
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oliaonucleoiidc comprises a nucleotide sequence complementary to the unique arbitrary 
sequence of Us corresponding oiigonuc.eoi.de tag ^Fig. 2). Preferably, this seque.ee is 
from about 20 to about 30 nucleotides long. The other end (the ? end) of the locus- 
specific tagged oligonucleotide is complementary to a target nucleic acid moiecuie 

5 compnsmg a nucleotide to be queried, e.g., a polymorphic nucleotide. Preferably, the 3' 
end of locus-specific tagged oligonucleotide is synthesized such thai when hybridized to 
the target nucleic acid molecule the locus-specific tagged oligonucleotide terminates 
one nucleotide 5 1 to the nucleotide to be queried. The portion of the locus-specific 
tagged oligonucleotide which hybridizes to the target nucleic acid molecule is 

10 preferably from about i 5 :.o about 50 nucleotides long. For example, the 5' end or 
locus-specific tagged oligonucleotide "A" would be complementary 10 the unique 
arbitrary sequence at the end of the oligonucleotide tag "A" which is bound to address 
"A" in the array. The 3' end of locus-specific tagged oligonucleotide "A" would be 
complementary to the polynucleotide sequence 5 1 of the nucleotide to be queried in 

15 target "A". 

To genotype a nucleic acid sample from an individual at locus "A", 
amplification primers specific for the region containing locus "A" are used to amplify 
the nucleic acid molecules in the sample. Locus-specific tagged oligonucleotides 
complementary to the nucieotide sequence 5' of locus "A" are combined with the 

20 amplification products under conditions suitable for hybridization (Fig. 3). The 

hybridization complex is subjected to single base extension. The four types of ddNTPs 
in the reaction mixture have different labels (e.g. , four different fluorescent tags, e.g. , 
the ddATPs would have an attached fluorophore that fluoresced at a first wavelength, 
the ddCTPs would have an attached fluorophore that fluoresced at a second wavelength. 

25 the ddGTPs would have an attached fluorophore that fluoresced at a third wavelength, 
and the ddTTPs would have an attached tluorophore that fluoresced at a fourth 
wavelength). During the single base extension reaction, a single ddNTP is attached 
(Fig. 4), resulting in the formation of a complex composed of the locus-specific tagged 
oligonucleotide extended with the labeled ddNTP and the amplification product. 
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After the single base extension reaction, the compiex of the labeled (extended) 
locus-specific tagged oligonucleotide and the amplification product is hybridized to ths 
arrav .Fig. 5». The oligonucleotide tag -'A" at address "A" selectively hybridizes to its 
corresponding locus-speciiic tagged oligonucleotide (now extended with a labeled 

5 cdNTP). the oligonucleotide tag "3" at address "B" selectively hybridizes to its 
corresponding iocus-specif.c tagged oligonucleotide (now extended with a labeled 
ddNTP). etc. The array is assayed to determine which label(s) is (are)present at which 
address on the array. For instance, if address "A" tluoresced at the same wavelength as 
the label on the ddATP. then the amplification product clearly contained a "T" at the 

i o .... sr - d nucleotide (because the single base extension reaction attaches the ddNTP 
complementary to the queried nucleotide). Fluorescence at a wavelength which is the 
same as the ddCT? label would indicate that the genotype was a "G". etc. Detection oi 
r.vo peaks within the wavelength emitted would indicate that different nucleotides were 
present at the queried position in the sample, e.g., that the individual was heterozygous 

15 at that locus. 

An advantage of the array and method described herein is that many addresses 
can be assayed simultaneously, producing genotyping data for many different genetic 
loci, e.g., SNPs. By utilizing a predefined set of locus-specific tagged oligonucleotides. 
e.g.. a set specific for assaying a set of genetic diseases, a single array can be utilized for 

20 a particular purpose, and by utilizing a different set of iocus-specific tagged 

oligonucleotides which correspond to the same tags on the array, the same array can be 
utilized for a different purpose. The universal chip serves as the repository of a set of 
addresses. to which the locus-specific tagged oligonucleotides (along with the labelec. 
genotyped SNPs) hybridize in a planned, predetermined manner. The array ana sens, of 

.25 locus-specific tagged oligonucleotides can therefore be used as components in kits for 
the purposes of sequencing and genotyping. Sets of locus-specific tagged 
oligonucleotides can therefore be used in combination with arrays as described herein 
' for use in forensics. identification of individuals, and disease diagnosis/prognosis. 
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The or,:,,, ,nv„„o, P^oes a ccnveniem ** accura.e wav 
„. „ e „„™e of an ,nd,v,dua.a, a po.yn,o^,c locus or ,he *«• » • 

,; D L„o, 0. emood.men, of ,he „** mvolves , : . « 

« for dfecn, oolynuciccaes . » polymorphic tocus. «d »> b>*«— - 

„. «»v. Tl* „«. of • * d,s,,nc- tM "» bo ==,erm,ned - <nown pos.nons o 

„ arrav. Each « represent . Mm y***** - ~" *— " 
™=s=n,s a o,s.,n=, allelic form a. the poivmorptuc H,o memod perm.cs ,he 

Jmnuaneous » » « * 

,0 c, aUele frccueiiacs in a aopuhacr, ^ofher =mb«™« ^ ■« — 
Advances of ft, disclosed method :nciude fta jw «- •«««= ■* " 
be used,. .Po^e any geneuc ma*e, , ,. no specie ous.omized genorypmg c„,p ,s 
reeded. In'addifto, the pro-soleorod prooe sequences svruhes.zed o„ .he tag cntp 
„ good hyb— resuhs b«w« .ho probe and ,he ^ Moreover. ,he ^ 
,5 lolor 0, mulhple ooior approach used ,n .h.s assay pro«des accurare 

be obtanted no. oniy for individual samples, but also for pooled samples. 

A pair of onrners o, a single pnmer can be used to amplify a region of a nude.c 
. a „d in a sample. The sample may be from a single ind,vidua, or may be from a 
,0 populauon o. individuals. The region which ,s ampUned include, a polymorphic ,ocus. 
The srep of ampltficatton ,s no, specie for a parncuiar allele. However, .he 
ampl.frcanon is designed ,0 specifically ampltfy regions of double s.randed or s.ngte 
sirandod nucleic acids which contain polymorphic toe. 

The amplification s,ep may be earned ou, using any .echnmue known m -ho an. 
,« One preferred technique ,s polymerase Cham reaction <PCR> ,n which DNA i. ampuned 
" togohthrmcally. As ,s,cnow„m,hea n .eachpnm=ro,apa,rofamplifica,,onpnme,s 
.vondaes ,o. and ,s preferrably complement ,o. oppos,,. s.rands of an allele. , 
referred ,na. ,he primers hybndue ,o a double s.,anded nucle.c acd ,n 
,re no, more man 1 » apan. and preferabiy wh,ch are much closer ,oge,h=, such as no. 
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more than : kb. 0.5 kb. 0.2 kb. O.i kb. 0 01 kb or 0.00! kb apart. A suitable DNA 
polymerase can be usea as is known :n me an. Thermostable poiymcrases are 
pamculariv convenient r'or ihermai cycling of rounds of pnmer hybridization., 
oolymenzation. and melting. Amplification of single stranded nucleic acids can also 
5 be empioved. 

After the amplification u is desirable to remove and/or degrade any excess 
pnmers and nucleotides. This car. be done by washing and/or enzymatic degradation, 
usina such enzymes as endonuclease I and alkaline phosphatase, for example. Other 
techniques, such as chromatography, magnetic beads, and avidin- or streptavidin- 
0 conjugated beads, as are known in the an for accomplishing the removal can a;so be 
used. It is net necessary to remove or destroy one of two strands of an amplified DNA 
product. 

The primer extension step of the method is the one which provides ailele- 
specificity to the method. The pnmer is designed to terminate one nucleotide 5' to the 

5 polymorphic locus. The pnmer is hybndized to the denatured amplified double 

stranded DNA. When the pnmer is extended by a single base using dideoxynucleotides 
and a DNA polymerase, the dideoxynucleotide which is complementary to the 
nucleotide at the polymorphic locus is added. Again, any DNA-dependent DNA 
polymerase can be used. These inciude. but are not limited to, E. coli DNA polymerase 

0 1, Kienow fragment of polymerase I. T4 DNA polymerase. T7 DNA polymerase. 7". 
aquaiicus DNA polymerase. This reaction is preferably performed at the T M of the 
pnmer with the tempiate to enhance product formation. 

One configuration for carrying out the pnmer extension step utilizes two 
different pnmers which each hybndize to opposite strands of an amplified double 

5 stranded DNA. Each pnmer terminates one nucleotide 5' to the polymorphic locus. The 
primer extension reaction may be more robust with one strand as a template than me 
other. In addition, the information obtained from the second strand should confirm the 
information obtained from the first strand. 
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An alternative method for primer extension involves use of reverse transcriptase 
'and one or two pnmers which hybridize 3' to the polymorphic iocus. This method may 
be desirable in cases where "forward" direction primer extension :s less robust than is 
desirabie. 

5 Each different dideoxynucleotide present in the single base extension reaction is 

uniquely labeled. The unique label can be detected and its amount will be proportional 
r.o the amount of the particular allele containing the corresponding deoxynuclectide in 
:he sample. If the sample is from a single individual, the nucleotide bases present at the 
poiymorpnic locus can be detemmed. If the sample is from a population of individuals 

10 the alieie frequency m the population can be determined. 

The abilitv :o perform the method of the present invention in a multiplex manner 
for a number of different polymorphic loci simultaneously is due to the seauence tags 
which are present on the extension pnmers at their 5' ends. The sequence tags permit 
the method operator to ultimately sort the products of multiplex amplification and 

1 5 multiplex primer base extension to different locations on an array. Each sequence tag 
on an extension primer is used only for a single polymorphic iocus. Thus the products 
of primer extension reactions can be separately analyzed because they can be hybridized 
to distinct known locations on an array. 

The sequence tags arc typically totally unrelated to the sequences ot the 

20 polymorphic alleles which are being analyzed. The sequence tags are chosen for their 
favorable hybridization characteristics. The tags are typically selected so that they have 
similar hybridization characteristics and minimal cross-hybridization to other tag 
sequences. Each sequence tag is attached to a specific gene or genetic marker, and then 
serves as a label for that particular gene or genetic marker. A generic tag array. 

25 corresponding to the pre-selected tag sequences is fabricated and used to detect the 
presence or absence or ratio of specific allelic forms m a test sample. See application 
Serial 'No. 08/626.285 filed April 4. 1996. and EP application no. 9730231 3. S which 
are expressly incorporated by reference herein. 
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The "labels which are used can be any which are known in the an. These induce 
rad.olabel-s. fluorescent label's, enzyme labels, epitope labels, and high affinity binding 
panner labels. Examples include isotopjcaiiy labeled nucleotides, iluorescein-labeled 
nucleotides, biotin-labeled nucleotides, digoxin labeled nucleotides. A different label is 

5 assigned to each base diceoxynucleotide in the single base extension reaction. Two. 
three, or four different labels can be used in the reaction. The different labels can be all 
of the same type, e.g.. enzyme labels, or they can be mixed types. 

Hybridization of the 5' portion of the extension primers (the tag sequences') to 
one or more probes which are immobilized to known locations on a solid support is also 

; contemplated. Hybridization can be performed under standard conditions known in -.he 
art for obtaining robust signals at high specificity. Standard washing conditions can 
also be employed. Detection of hybridization of the extension primers can be none 
using standard means, depending on the type of labeis used. For example, fluorescence 
can be detected and quantified using optical detection means. Radiolabels can be 

5 detected using autoradiography or scintillation counting. Enzyme labels can be detected 
using enzymatic reactions and assaying for the final product of the enzyme reaction. 
Antigenic labels can be used using immunological detection means. Affinity binding 
partners such as strepavidin or avidm and biotin can also be used as a label. 

The reactions of the present invention can be performed in a single or multiplex 

0- format. For example, the amplification step can be performed using up to 20, 30. 40. 
. 50. 75. 100, 150. 200. 250. or 300 different pnmer pairs to amplify a corresponding 
number of polymorphic markers. These cxn be pooled for the single base extension 
reaction, if desired. Pooling for the hybridization step is desirable so that thousands ot 
hybridizations can be done simultaneously. 

5 [nan alternative embodiment the amplification step can be omitted. Thus, it 

sufficient DNA is available, the single base extension reaction can be performed directly 
on genomic DNA. In another particular embodiment, amplification of the entire 
genome can be performed using random primers. 
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■ Se.s of primers according to :he present invention comprise an amplification pair 
and an extension pnmer. These are used together in a method for determining a ratio of 
nucleotides present at a polymorphic iccus. These may be packaged in a single 
container, oreferabiy a divided container or package. The pair of primers amplify a 
5 region of double stranded DNA which comprises a polymorphic locus. The extension 
primer has two portions, a 3' portion which is complementary to a portion of the region 
of double stranded DNA which contains the polymorphic locus and a 5' portion which is 
not complementary to the region of double sLranded DNA. The 5' region is the tag 
sequence which is complementary :o the tag array which is used to sort and analyze the 

:0 • oroducts of the single base extension reaction. The end of the smgie base extension 
pnmer terminates one nucleotide 5' to the polymorphic locus. 

Kits according to the present invention may contain one or more sets of primers 
as described above. The kit may also contain a solid support comprising at least one 
probe which is attached to the solid support. The one or more probes are 

15 complementary to the 5' portion of the extension 'pnmer, i.e.. to the tag sequences. 
Solid supports, according to the present invention include beads, microtiter plates, and 
arrays. 

Hybridizing Nucleic Acids to Arrays of Alleie-Specific Probes 

"Hybridization" refers to the formation of a bimolecular complex of two 

20 different nucleic acids through complementary base pairing. Complementary base 
pairing occurs through non-covalent bonding, usually hydrogen bonding, of bases that 
specifically recognize other bases, as in the bonding of complementary bases in double- 
stranded DNA. In this invention, hybridization is carried out between a target nucieic 
acid, which is prepared from the nucleic acid sample by allele-specific amplification, 

25 and at least two probes which have been immobilized on a substrate to form an array. 

One. of skiil in the an will appreciate that an enormous number of array designs 
are suitable for the practice of this invention. .An array will typically include a number 
of probes that specifically hybridize to the sequences of interest (tags). In addition, it is 
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preferrcd that the arrav mciude one or more controi probes, in one embodiment, the 
array ;s a hiah densitv array. A high density array is an array used to hybridize with a 
:areet nuc:eic acid sample to deiec: the presence of a iarge numDer of allelic markers, 
orefcrabiy more than !C. more preferably more than iOO. and most preferably more than 
1000 allelic markers. 

High density arrays are suitable for quantifying small variations in the frequency 
of an allelic marker in the presence of a large population of heterogeneous nucieic acids. 
Such high density arrays can be fabricated either by de novo synthesis on a substrate or 
by spotting or transporting nucleic acid seauences onto specific locations of a substrate. 
Both of :hese methods prcduce nucleic acids which are immobilized on the array at * 
.particular locations. Nucieic acids can be purified and/or isolated from biological 
materials, such as a bacterial plasmid containing a cloned segment of a sequence or 
interest. Suitable nucleic acids can also be produced by amplification of templates or 
by synthesis. As a noniimiting illustration, polymerase chain reaction and/or in vitro 
transcription, are suitable nucleic acid amplification methods. 

The term "target nucleic acid" refers to a nucleic acid (either synthetic or derived 
from a biological sample or nucleic acid sample), to which the probe is designed to 
specifically hybridize. In this invention, such target nucleic acids are the same as -the 
sequence tags. It is either the presence or absence of the target nucleic acid that is to be 
detected, or the amount of the target nucleic acid that is to be quantified. The target 
nucleic acid has a sequence that is complementary t.o the nucleic acid sequence of the 
corresponding probe directed to the target. The term "target nucieic acid" can refer to 
the specific subsequence of a larger nucleic acid to which the probe is directed or to the 
overall sequence (e.g.. gene or mRNA) whose presence it is desired to detect. The 
difference in usage will be apparent from context. 

As used herein a "probe" is defined as a nucleic acid, capable of binding to a 
target nucleic acid of complementary sequence through one or more types of chemical, 
bonds, usually through complementary base pairing, usually through hydrogen bond 
formation. As used herein, a probe can include natural ii.e. A. G. U. C. or 1 ) or 
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modiried bases «*,?.. 7-deazaguanosine. inosine. etc.). A probe can also include an 
oligonucleotide. An cligonucieoucs is a single-stranded nucleic acid of 2 to n bases, 
where can be any integer less than 1000. Nucleic acids can be cloned or synthesized 
using any technique known ,n the an. They can also include non-natually occurring 
5 nucleotide analogs, such as those which are modified to improve hybridization, and 
peptide nucleic acids. In addition, the bases in probes may be joined by a linkage other 
than a ohosphodiester bond, so long as it does not interfere with hybridization. Thus, 
probes may be peptide nucieic acids in which the constituent bases are joined by peptide 
bonds rather than phosDhodiester linkages. 

10 Probe Design 

.An array includes "test probes", also termed '-oligonucleotide tags" herein. Test 
probes can be oligonucleotides that range from about 5 to about 45 or 5 to about 500 
nucleotides, more preferably from about 10 to about 40 nucleotides and most preferably 
from about 15 to about 40 nucleotides in length. In other particularly preferred 

1 5 embodiments the probes are 20 to 25 nucleotides in length. In another embodiment, 
test probes are double or single stranded DNA sequences. DNA sequences can be 
isolated or cloned from natural sources or amplified from natural sources using natural 
nucleic acids as templates. However, in situ synthesis of probes on the arrays is 
preferred. The probes have sequences complementary to particular subsequences of the 

20 genes whose allelic markers they are designed to detect. Thus, the test probes are 

capable of specifically hybridizing to the target nucleic acid they are designed to detect. 

The term "perfect match probe" refers to a probe which has a sequence designed 
to be perfectly complementary to a particular target sequence. The probe is typically 
perfectly complementary to a portion (subsequence) of the target sequence. The perfect 

25 match probe can be a -test probe." a -normalization control probe." an expression level 
control probe and the like. A perfect match control or perfect match probe is. however, 
distinguished from a -mismatch control" or -mismatch probe" or -mismatch control 
probe." 
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In addition to test orobes that bind the target nucleic ac.d(s) of interest, the high 
cens.ty array can contain a number of control probes. The control probes fail into iwo 
categories: normalization controls and mismatch controis. 

Normalization controls are oligonucleotide or other nucleic acio probes that are 
: complementary to labeled reference oligonucleotides or other nucleic acid sequences 
:hat are added to the nucie.c acid sample. The signals obtained from the normaiizauon 
controis after hybndtzation provide a control for variations in hybridization conditions. 
:ab el mtensitv. "reading" efficiency, and other factors that may cause the signal of a 
perfect hybridization to vary between arrays, in a preferred embodiment, signals te.g.. 
10 florescence intensity). read rrcm ail other orobes in the array are divided by the signal 
fluorescence intensity) from the control probes, thereby normalizing the 
measurements. 

Virtually any probe cxn serve as a normalization :ontrol. However, it is 
recognized that hybridization efficiency vanes with base composition and probe lengtn. 
15 Preferred normalization probes areselected to reflect the average length of the other 
probes present in the array; however, they can be selected to cover a range of lengths. 
The normaiizauon conrrol(s) can also be selected to reflect the (average) base 
composition of the other probes in the array; however in a preferred embodiment, only 
one or a few normalization probes are used and they are selected such that they 
20 avbndize weil {i.e. no secondary structure) and do not match any target-specific probes. 

Mismatch controls can also be provided for the probes to the target alleles or for 
normalization controls. The terms "mismatch control" or 'mismatch probe" or 
•mismatch control probe" refer to a probe whose sequence is deliberately selected not to 
be perfectly complementary to a particular target sequence. Mismatch controls are 
25 oligonucleotide probes or other nucleic acid probes identical to their corresponding test 
or control probes except for the presence of one or more mismatched bases. A 
m.smatcned base is a base selected so that it is not complementary to the corresponding 
base in the. target sequence to which the probe would otherwise specifically hybridize. 
One or more mismatches arc selected such that under appropriate hybridization 
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co-ndidons stringent conditions, the test or ccntroi prooe would be expected :o 
hvbridize wuh its tareet sequence! but the mismatch orobe would not hybridize tor 
,vouid hvbridize to a significantly lesser extent,. Preferred mtsmatch probes contain a . 
central mismatch. Thus, for example, -here a prooe ,s a 20 mer. a corresponding 
mismarch nrooe will have the identical sequence except for a single base mismatcn 
(,... substituting a G. a C. or a T for an A) at any of positions 6 through 14 (the central 
mismatch). 

For each mismatch control in a high-density array there typically exists.a 
corresoonding perfect match probe that is perfectly complementary to the same 
oamcuiar tareet sequence. The mtsmatch may comprise one or more bases. White the 
mtsmatch. s> mav be located anywhere in the mismatch probe, terminal mismatches are 
less desirable, as a terminal mtsmatch is less likely :o prevent hybridization of the target 
sequence. In a particularly preferred embodiment, the mtsmatch is located at or near the 
center of the probe such that the mtsmatch is most likely to destabilize the duplex w,th 
5 the target sequence under the test hybridization conditions. 

Mismatch probes provide a control for non-specific binding or cross- 
hybridizaaon to a nucleic acid in the sample other than the target to which the probe is 
directed. Mismatch probes thus indicate whether or not a hybridization is specific. For 
example, if the target is present, the perfect match probes should be consistently 
0 bn-hter than the mismatch orobes. The difference tn intensity between the perfect 

match and the mismatch probe (WW P rovld « a S ood meESUrC ° f *" cmM ° n 
of the hybridized material. 

The array can also include sample preparation/amplification control probes. 
These are probes that are complementary to subsequences of control genes selected 
.5 because they do not normally occur in the nucleic acids o f the particular b.ologtca! 

sample being assayed. Suitable sample preparation-amplification control probes 
. include, for example, probes to bacterial genes (e. S .. Bio where the sample in 
question is trom a eukaryote. 
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. ina preferred jmbodimem. oligonucleotide probes in the high density array are. 
selected to bind soer.ncailv to me nucleic acid target :o which they are directed with 
minimal non-specific binding or cross-hybridization under the particular hybridization 
conditions utilized. Because the high density arrays of this invention can contain m 

excess of 100.000 cr even 1.000.C00 different probes, it is possible to provide every 
probe of a characteristic length that binds to a particular nucleic uc.d sequence. 

Formins High Density Arrays 

Hiah density arrays are particularly useful for monitoring the presence of allelic 
markers. The fabrication and application of high density arrays in gene expression 
) monitors have beer, disclosed previously in. for example. WO 97/10365. WO 
92/10583. U.S. Application Ser. No. 08/772.376 tiled December 23. 1996: senal 
number 08/529.1 1 : tiled on September 15,1 995; senal number 08/168,904 filed 
December 15. 1993; senal number 07/624.1 14 filed on December 6, 1990, senal 
number 07/362,901 filed June 7, 1990, andin U.S. 5,677,195, ail incorporated herein for 
5 all purposes by reference. In some embodiments using high density arrays, high 
densirv oligonucleot.de arrays are synthesized using methods such as the Very Large 
Scale Immobilized Polymer Synthesis (VLSIPS) disclosed in U.S. Pat. No. 5.445,934 
incorporated herein for all purposes by reference. Each oligonucleotide occupies a 
known location on a substrate. A nucleic acid target sample is hybridized with a high 
0 density array of oligonucleotides and then the amount of target nucleic acids hybndized 
to each orobe in the array is quantified. 

Synthesized oligonucleotide arrays are particularly preferred for this invention. 
• Oligonucleotide arrays have numerous advantages over other methods, such as 
efficiency of production, reduced intra- and inter array variability, increased information 
25 content, and high signal-io-noise ratio. 

Preferred high density arrays compnse greater than about 100. preferably greater 
than about 1000. more preferably greater than about 16.000. and most preferably greater 
' man 65.000 or 250.000 or even greater than about i .000.000 different oligonucleotide 
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probes, preferably ,n iess than 1 :nr of surrace area. The oligonucleotide probes range 
trom about 5 to aoout 50 or about 500 nucleotides, more preferably from about 10 to 
about -0 nucleotides, and most preferably from about 15 :o aoout 40 nucleotides m 

length.. 

Methods of terming high density arrays of oligonucleotides, peptides and other 
polvmer sequences with a minimal number of synthetic steps are known. The 
oligonucleotide analogue array can be synthesized on a solid substrate by a variety of 
methods, including, but not l.m.ted to. iight-directed chemical coupling ana 
mechanically directed coupling. See P.rrung a ai. U.S. Patent No. .V.43.S54 (see also 

, ?CT Application No. WO 90/150701 and Fodor et ai. PCT Publicatior. Nos. WO 
92/10092 and WO 93/09668 and U.S. Ser. No. 07/980.523. which disclose methods of 
forming vast arrays of peptides, oligonucleotides and other molecules using, 
for example, light-directed synthesis techniques. See also. Fodor ai ai.. Science, 
161-11 (1991). These procedures for synthesis of polymer arrays are now referred to as 

5 VLSIPS™ procedures. Using the VLSIPS™ approach, one heterogeneous array of 
polymers is convened, through simultaneous coupling at a number of reaction sites, into 
a different heterogeneous array. See. U.S. Application Serial Nos. 07'796.243 and 
07/980,523. 

The development of VLSIPS™ technology as described in the above-noted U.S. 

0 Patent No. 5.143.854 and PCT patent publication Nos. WO 90/15070 and 92/10092. :s 
considered pioneering technology in the fields of combinatorial synthesis and screening 
of combinatorial libraries. More recently, patent application Serial No. 08/082,937. 
filed June 25. 1993, describes methods for making arrays of oligonucleotide probes that 
can be used to check or determine a partial or complete sequence of a :arget nucleic acid 

5 and to detect the presence of a nucleic acid containing a specific oligonucleotide 
sequence. 

In brief, the iight-directed combinatorial synthesis of oligonucleotide arrays on a 
glass surface proceeds using automated phosphoramidite chemistry and chip masking 
techniques. In one specific implementation, a glass surface is derivatized with a s.lane 
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reagent containing a rancuonai _^roup. e.g.. a nydroxyi cr amine group blocked by a 
photolabiie protecting group. Photolysis through a photolithogaphic mask is used 
.selectively to -expose functional groups which are then ready to react with incoming 
S'-photoprotected nucleoside phosphoramicites. The phosphoramidites react only with 

5 those sites which are illuminated (and thus exposed by removal of the photolabiie 
blocking group.!. Thus, the phosphoramicites oniy add to those areas selectively 
exposed from the preceding step. These steos are repeated until the desired array of 
sequences have been synthesized on the soiid surface. Combinatorial synthesis of 
different oligonucleotide analogues at different locations on the array is determined by 

10 the pattern of illumination during synthesis and the orcsr of addition of coupling 



reauents. 



In the event that ah oligonucleotide analogue with a polyamide backbone is used 
in the VLSIPS™ procedure, it 'is generally inappropriate to use phosphoramidite 
chemistry to perform the synthetic steps, since the monomers do not attach to one 

15 another via a phosphate linkage. Instead, peptide synthetic methods are substituted. 
See. e.g.. Pirrung et al. U.S. Pat. No. 5.143.854. 

Peptide nucleic acids are commercially available from, e.g., Biosearch. Inc. 
(Bedford, MA) which comprise a polyamide backbone and the bases found in narurally 
occumng nucleosides. Peptide nucleic acids are capable of binding to nucleic acids 

20 with high specificity, and are considered ' Migonucleotide analogues" for purposes of 
this disclosure. 

Additional methods which can be used to generate an array of oligonucleotides 
on a singie substrate are described in co-pending Applications Ser. No. 07/980.523. 
filed November 20. 1992. and 07/796,243. filed November 22, 1991 and in PCT 
25 Publication No. WO 93/09668. In the me'.nods disclosed in these applications, reagents 
are delivered to the substrate by either ( 1 ) flowing within a channel defined on 
predefined regions or (2) '•spotting" on predefined regions or (3) through the use of 
photoresist. However, other approaches, as well as combinations of spotting and 
flowing, can be employed, in each instance, certain activated regions of the substrate 
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r rJ(rlI \ ns w hea the monomer solutions are 
are mechanically separated rrom omcr .^ions> .uien 

delivered to the vanous reaction sues. 

, , vo .cal -now channel" method applied to the comoounas and libraries ot the 
present nvem.on can generally be described as follows. Diverse polymer sequences are 
svnthcstzec at selected regions of a suosaaie or solid support by forming How caannels 
on a surface of the substrate through which appropriate reagents flow or in which . 
appropriate reacts are placeo. For example, assume a monomer "A" is to be bounfl to 
the substrate in a first group of selected regions. If necessary, all or pan of the suriace 
' of the substrate in ail or a pan of the selected regions is activated for binding by. tor 
• ( , xamp ie. fiowine appropriate reagents through all or some ot the channels, or oy 
washine the enure substrate with appropriate reagents. After placement of a channel 
block on the surface of the substrate, a reagent having the monomer A flows througn or . 
i, placed =n ail or some of the'ehanneifs). The channels provide fluid contact to tne tirst 
seiectec regions, thereby binding the monomer A on the substrate directly or indirectly 
5 (via a soacer) in the first selected regions. 

Thereafter, a monomer "B" is coupled to second selected regions, some ot which 
can be included among the first selected regions. The second selected regions wul be in 
fluid contact with a second flow channel(s) through translation, rotation, or replacement 
of the channel block on the surface of the substrate; through opening or closing a 
^0 selected valve; or through deposition of a layer of chemical or photores.st. If necessary. 
' a step is performed for activating at least the second regions. Thereafter, the monomer 
B is flowed throw* or placed in the second flow channel(s), binding monomer B at the 
second selected locations, la this particular example, the resulting sequences bounu to 
the substrate at this stage of processing will be. for example. A. B. and AB. The process 
25 is repeated to form a vast array of sequences of desired length at known locations on the 
substrate. 

After the substrate is activated, monomer A can.be flowed through some ot the 
channeis.-monomer B can be flowed through other channels, a monomer C can be 
flowed through stiil other channels, etc. In this manner, many or all of the reaction 
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regions are reacted with a monomer before the channel block must be moved or the 
substrate must be washed and/or reactivated. By making use of many or all of the 
available reaction regions simultaneously, the number of washing and activation steps 
can be minimized. 

One of skill m the an will recognize that there are alternative methods of 
forming channels or otnerwise protecting a portion of the surface of '.he substrate. For 
example, according to some embodiments, a protective coating such as a hydrophilic or 
hydrophobic coating (depending upon the nature of the solvent) is utilized over .portions 
of the substrate to be protected, sometimes in combination with materials that facilitate 
0 wetting by the reactant solution in other regions, Ln this manner, the flowing solutions 
- are further prevented from passing cutsidc of their designated How paths. 

High density nucleic acid arrays can be fabricated by depositing presymhezied 
or natural nucleic acids in predetermined positions. Synthesized or natural nucleic 
adds are deposited on specific locations of a substrate by light directed targeting and 
5 oligonucleotide directed targeting. Nucleic acids can also be directed to specific 
locations in much the same manner as the flow channel methods. For example, a 
nucleic acid A can be delivered to and coupled with a first group of reaction regions 
which have been appropriately activated. Thereafter, a nucleic acid B can be delivered 
to and reacted with a second group of activated reaction regions. Nucleic acids are 
0 deposited in selected regions. Another embodiment uses a dispenser that moves from 
region to region to deposit nucleic acids in specific spots. Typical dispensers include a 
micropipet or capillary pin to deliver nucleic acid to the substrate and a robotic system 
to control the position of the micropipet with respect to the substrate. In other 
embodiments, the dispenser includes a series of tubes, a manifold, an array of pipettes 
25 or capillary pins, or the like so that various reagents can be delivered to the reaction 
regions simultaneously. 
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nvbridization Condiuons 

The term ^.rlnaent conditions" refers to conditions under which a probe wul 
hvbr.dize to us tarzet subsequence, but with only insubstantial hybridization to other 
sequences or to other sequences such that the difference may be identified. Stringent 
< conditions are sequence-depenaent and wul be different in different circumstances. 
Longer sequences hybridize specifically at higher temperatures. Generally, stringent 
conditions are selected to be about 5 C lower than the thermal melting po.nt ,T m ) tor 
the specific sequence at a defined ionic strength and pH. 

The T m is the temperature, under defined ionic strength, P H. and nucie.c acid 
10 concentration^ which 50% of the probes complementary to the target sequence 
hybndize to the target sequence at equilibrium. As the target sequences are generally 
present-in excess, at T, r 50% of the probes are occupied at equilibrium',. Typ.caliv, 
strineem conditions wiil be those in which the salt concentration is at least about 0.01 to 
1.0 M concentration of a Na ot other salt at pH 7.0 to 3.3 and the temperature is at least 
15 about 30°C for short probes ye.g.. 10 to 50 nucleotides). Stringent conditions can also 
be achieved with the addition- of destabilizing agents such as .formamide. 

' The phrase "hybridizing specifically to" refers to the binding, duplexing, or 
hybridizing of a molecule substantially to or only to a particular nucleotide sequence or 
sequences under stringent conditions when that sequence is present in a complex 
?0 mixture (e.g., total cellular) of DNA or RNA. It is generally recognized that nucleic 
acids are denatured by increasing the temperature or decreasing the salt concentration ot 
the buffer containing the nucleic acids. Under low stringency conditions [e.g., low 
temperature and/or high salt) hybrid duplexes (e.g., DNAiDNA, RNA:RNA. or 
' RNA: DNA) will form even where the annealed sequences are not perfectly 
25 complementary. Thus, specificity of hybridization is reduced at lower stringency. 
Conversely, at higher stringency (e.g., lugher temperature or lower salt) successful 
hybridization requires fewer mismatches. 

One of skill in the an will appreciate that hybridization conditions can be 
selected to provide any degree of stringency. In a preferred embodiment, hybridization 
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:s neriormea at low stringency. 1:1 this case in 6X S5PE-T at 3TC (0.005% inton X- 
100), :o ensure hybridization, and then subsequent washes are performed at hisner 
strinaencv (e.g.. I X SSPE-T a- 57'C) to eliminate mismatched hybrid duplexes. 
Successive washes can be performed at increasingly higher stringency [e.g., down to as 
low as 0.25 X SSPE-T at 37*C to SOT) until a desired level of hybridization specificity 
;s obtained. Stringency can also oe increased by addition of agents such as formamide. 
Hybridization specificity can be evaluated by comparison of hybridization 10 the test- 
probes with hybridization to the various controls that can be present {e.g., expression 
level control, normalization control mismatch controls, etc.). 

in general, there is a tradeoff between hybridization specificity (stringency] and 
■ signal intensity. Thus, in a preferred embodiment, the. wash is performed at the highest 
stringency that produces consistent results and that provides a signal intensity greater 
than approximately 10% of the background intensity. Thus, in a preferred embodiment, 
the hybridized array can be washed at successively higher stringency solutions and read 

5 between each wash. Analysis of the data sets thus produced will reveal a wash 

stringency above which the hybridization pattern is not appreciably altered and which 
provides adequate signal for the particular oligonucleotide probes of interest. 

The stability of duplexes formed between RNAs or DNAs are generally in the 
order of RNA:RNA > RNA:DNA > DNA:DNA. in solution. Long probes have better 

0 dupiex stability with a target, but poorer mismatch discrimination than shorter probes 
(mismatch discrimination refers to the measured hybridization signal ratio between a 
perfect match probe and a single base mismatch probe). Shorter probes (e.g.. 3-mers) 
discriminate mismatches very well, but the overall duplex stability is low. 

Altering the thermal stability (T m ) of the duplex formed between the target and 

'5 the probe using, e.g.. known oligonucleotide analogues allows for optimization ot 
duplex stability and mismatch discrimination. One useful aspect of altering the I _ 
anses rrom the fact that adenine-thymine { A-T) duplexes have a lower T m than guamne- 
cytosine (G-C) duplexes, due in part to the fact that the A-T duplexes have two 
hydrogen bonds per base-pair, while the G-C duplexes have three hydrogen bones per 
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base pair, in heterogeneous oligonucleotide arrays in which there is a non-uniform 
distribution or bases, it is not generally possible to optimize hybridization for each 
oligonucleotide probe simultaneously. Thus, in seme embodiments, it is desirable to 
selectively destabilize G-C duplexes and/or to increase the stability or" A-T duplexes. 
This can be accomplished, e.g.. by substituting guanine residues in the probes of an 
array which form G-C duplexes with hypoxanthir.e. cr by substituting adenine residues 
in probes which form A-T duplexes with 2.6 diaminopunne or by using tetramethyl 
ammonium chloride (TMACO in place oiNaCl. 

Altered duplex stability conferred by using oligonucleotide analogue probes can 
be ascertained by following, ^..fluorescence signal intensity of oligonucleotide 
analogue arrays hybridized with a target oligonucleotide over lime. The data allow 
optimization of specific hybridization conditions at. e.t>.. room temperature. 

.Another way of verifying altered duplex stability is by following the signal 
intensity generated upon hybndization with time. Previous experiments using DKA 
targets and DNA chips have shown that signal intensity increases with time, and that the 
more stable duplexes generate higher signal intensities faster than less stable duplexes. 
The signals reach a plateau or "saturate" after a certain amount of time due to all of the 
binding sites becoming occupied. These data allow for optimization of hybridization, 
and determination of the best conditions at a specified temperature. 

Methods of optimizing hybndization conditions are well known to those of skill 
in the art [see. e.g.. Laboratory' Techniques m Biochemistry and Molecular Biology. 
Vol. 24: Hybridization With Nucleic Acid Probes. ?. Tijssen. ed. Elsevier. N.Y.. 
(1993)). 

Signal Detection 

The hybridized nucleic acids can be detected by detecting one or more labels 
attached to the target nucleic acids. The labels can be incorporated by any of a number 
of means well known to those of skill in the art. However, in a preferred embodiment, 
the label is incorporated by labeling the pnmers prior to ihe amplification step in the 
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preparation or" the target nucleic acids. Thus, for example, polymerase chain reaction 
with labeled primers will provide a labeled amplification product. 

Detectable labels suitable for use in the present invention include any 
jomposition detectable by spectroscopic, photochemical, biochemical 

5 immunochemical, electrical, optical, or chemical means. Useful labels in the present 
invention include biotin tor staining with labeled streptavidin conjugate, magnetic beads 
(e.g., Dynabeads™), fluorescent dyes (e.g.. fluorescein, texas red. rhodamine. green 
fluorescent protein, and the like), radiolabeis (£.g., 3 H. '"I, "S, U C. or :2 P). enzymes 
(e.g., horseradish oeroxidasc. aikaline phosphatase and others commonly used in an 

) ELISAi. and colcnmetric labels such as colloidal gold or colored glass or plastic (e.g., 
polystyrene, polypropylene, latex, etc.) beads. Patents teaching the' use of such labels 
inciudeUS. Patent Nos. 3.817.337; 3,850.752; 3.939.350; 3.996,345; 4,277,437; 
4,275.149; and 4.366,241. 

Means of detecting such labels are well known to those of skill in the art. Thus, 

5 for example, radiolabeis can be detected using photographic film or scintillation 
counters, fluorescent markers can be detected using a photodetector to detect emitted 
light. Enzymatic labels are typically detected by providing the enzyme with a substrate 
and detecting the reaction product produced by the action of the enzyme on the 
substrate, and coionmetnc labels are detected by simply visualizing the colored label. 

0 One method uses colloidal gold label that can be detected by measuring scattered light. 

Means of detecting labeled target nucleic acids hybridized to the probes of the 
array are known :o those of skill in the art. Thus, for example, where a colorimetric 
label is used, simple visualization of the label is sufficient. Where a radioactive labeled 
probe is used, detection of the radiation (e.g. with photographic film or a solid state 

5 detector) is sufficient. 

Detection of target nucieic acids which are labeled with a fluorescent label (i.e., 
a "color tag") can be accomplished with fluorescence microscopy. The hybridized array 
can be excited with a light source at the excitation wavelength of the particular 
fluorescent labei and the resulting fluorescence at the emission wavelength is detected. 
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The excitation light source can be a iaser appropriate for the excitation cf the 
fluorescent label. 

The confocai microsccoe can be automated wuh a comouter-contrcilcd stage tc 
automatically scan the entire high density array, ,e.. to sequentially examine .ndividual 
5 probes or adjacent groups of probes in a systematic manner until all probes have been 
examined. Similarly, the microscope can be equipped with a phototransducer (e.g.. a 
DhotomuitipHer. a soiid state array, a CCD camera, etc.) attached to an automated data 
acquisition system to automatically record the fluorescence signal produced by . 
hvbridization to each oligonucleotide orobe on the array. Such automated systems are 
0 described at length in U.S. Patent No: 5. 143.S54. PCT Application 20 91 10092. and 
copending U.S. Application 5er. No. 08/195.889. filed on February 10. .994. Use of 
laser illumination in conjunction with automated confocai microscopy tcr signal 
detection permits detection at a resolution of better than about 100 |im. more preferably 
better than about 50 \im. and most preferably better than about 25 \xm. 

Two different fluorescent labels can be used in order to distinguish two alleles at 
each marker examined. In such a case, the array can be scanned two times. During the 
first scan, the excitation and emission wavelengths are set as required to detect one of 
the two fluorescent labels. For the second scan, the excitation and emission 
wavelengths are set as required to detect the second fluorescent label. When the results 
20 from both scans are compared, the genotype identification or allele frequency can be 
determined. 

Quantification and Determination of Genotypes 

The term ,; quantifying" when used in the context of quantifying hybridization of 
a nucleic acid sequence or subsequence can refer to absolute or to relative 
25 quantification. Absolute quantification can be accomplished by inclusion of known . 
concentration! s) of one or more target nucleic acids i.e.*.. control nucleic acids such as 
Bio B. or known amounts the target nucleic acids themselves) and referencing the 
hybridization intensity of unknowns with the known target nucleic acids (c-..?.. through 



15 
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generatiori of a standard curve). Alternatively, relative quantification can be 
accomplished by comparison of hybridization signals between two or more genes, or 
between r.vo or more treatments to quantify the changes in hybridization intensity and. 
by implication, the frequency of an allele. Relative quantification can also be used to 
5 merely detect the presence or absence of an allele in the target nucleic acids. In one 
embodiment, for example, the presence or absence of the two alleles of a marker can be 
determined by comparing :he quantities of the first and second color tag at the known 
locations in the array, i.e.. on the solid support, which correspond to the allele-spccific 
probes for the two alleles. 
1 0 A preferred quantifying method ;s to use a confocal microscope and fluorescent 

labels. The GeneChip' system ' ( Affyme:rix. Santa Clara. CA) is particularly suitable 
for quantifying the hybridization; however, it will be apparent to those of skill in the art 
that any similar system or c-her effectively equivalent detection method can also be 
used. 

1 5 Methods for evaluating the hybridization results vary with the narure of the 

specific probes used, as well as the controls. Simple quantification of the fluorescence 
intensity for each probe can be determined. This can be accomplished simply by 
measuring signal strength at each location ^representing a different probe) on the high 
density array (e.g.. where the label is a fluorescent label, detection of the florescence 

20 intensity produced by a fixed excitation illumination at each location on the array). 

One of skill in the art. however, will appreciate that hybridization signals will 
vary in strength with efficiency of hybridization, the amount of label on the sample 
nucleic acid and the amount of the particular nucleic acid in the sample. Typically 
nucleic acids present at very low levels [e.g., < 1 ptVI) will show a very weak signal. At 

25 some low level of concentration, the signal becomes virtually indistinguishable rrom 
background. In evaluating the hybndization data, a threshold intensity value can be 
selected below which a signal is counted as being essentially indistinguishable from 
background. 
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= -nhcrrate ere ) Background signals 

oligonucleotide probe, control pro.es. « array ^ 1 " 

^. txto^i signal car. he calculi for .he entire amy. or a different 

j^;** «* * «*— *-* ■•» nua5,c ■* In 1 p f0 , 

embodiment. »M .3 » » M***" ~ 

the , 0 ,es. to ,0% of the prone. ,„ » «• *. • *«"' 

, , s ^ for each urge, aUeie. forthe .owes, 5,, to ,0% of the probes tor each 
' allde However, where *. probes to a particular adcie hybrto, K we,, ana „.us aope, 

le specially »*, » . ^ ™ *~ « * 

*»1 =aicu,a„o„. Alternatively, background mav be ca,oula,ed as the averse 
h ;„d i2 ,.on signal intensity produced by hybnd, Z a„o„ ,0 probes that are not 
. Implement ,0 any se,uenc= found ,„ me satnp.e («,. probes dtrectcd to nucleic 
' ,d of the opposite sense or ,0 g enes not found tn the sample, such a, bacertal genes 
2 e the sal ,e is mammaltan nucletc acids,. Backed can also be calcu ated as 
h lerage Z* intend produced b, region, of Ore array tha, ,ack any probes at a, 
„ a preferred embodtmen, background stgnal ,s reduced b, the use of a decent ,e.,. 
*0 ' C.T^orabloc kl n g reage„,<, S ..spennDNA.=ot- l DNA. etc. (during the 
' hybridization to reduce non-spectttc binding, .n a parttcularly preferred 
the hvbnd, Z a t ,on ,s periled ,n the presence ofabou, 0, mg,m, ON* 
sperm DNA,. The use of blocking agents ,n hybr.daat.oh is well known to those 
skill tn the art liee. e.g., Chapter 8 in ?. Tijssen. supra). 
,< ''The hfh density array can include mtsmatch control, in a preferred 
" embodiment, ."here is a m.smatch contro, having a centra, mismatch for cve„ prone ,„ 
' thearrav except the normalization control, Ills expected that after washing in 

! In, cond ,on, where a oerfect match wouldbe expected to P»~ 
bu , ,„ ,„e m.smatc, ,he signal from ,he mismatc, controls should only re„ec, no, 
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soecfic bindina or the presence in the sample of a nucleic acid to hybridizes wuh the 
mismatch. Where both the probe in question and its corresponding m.smatcn control 
show hiah sumais. or the m.smatcn shows a higher signal than us corresponding test 
probe, there is a proolem with the hybridization and the signal from those probes >s 
< ignored. For a aiven marker, the difference in hybridization signal intensity (I 3lielsl - 
[".. , .) between an allele-speciik probe .perfect match probe) for a first allele and the 
corresponding probe for a second allele (or other mismatch control probe) is a measure 
of the oresence of cr concentration of the first allele. Thus, in a preferred embodiment, 
the signal of the mismatch probe is subtracted from the signal for us corresponding test 
0 probe to provide a measure of the signal due to specific binding of the test probe. 

The concentration of a particular sequence can then be determined by measuring 
the sisnal intensitv of each of the probes that bind specifically to that gene and 
normalizing to the normalization controls. Where the signal from the probes is greater 
than the mismatch, the mismatch is subtracted. Where the mismatch intensity is equal 
15 to or greater than its corresponding test probe, the signal is ignored (i.e., the signal 

cannot be evaluated). 

For each marker analyzed, the genotype can be unambiguously determined by 
comoaring the hybridization pauems obtained for each of the two labels, e.g., color tags 
employed (Fig. 8). If hybridization is indicated for one color tag to its corresponding ^ 
>0 aUele-'specific probe (e.g., 'A") but not for the other color tag [e.g.. "G") (pattern at left 
in Fig. 3). then the indicated genotype of a.diploid organism would be homozygous 
VA If hybridization is indicated only for the other color tag to its corresponding 
allele-specific probe (e.g., "G") (partem at center in Fig. 8), then the indicated genotype 
of a diploid organism would be homozygous G/G. If hybndzation is indicated tor both 
25 color tags to their corresponding allele-soecific probes (pattern at right in Fig. S). then 
the indicated genotype of a diploid organism would be heterozygoous (A-G). 

Mareinal detection of hybridization, indicated by an intermediate positive result 
\e.g.. less than 1%. or from l-5?4. or from 1-10%. or from 2-10%. or from 5-10%. or 
from 1-20%. or from 2-20%. or from 5-20%. or from 10-20% of the average of all 
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positive hybridization results obtained tor ihe enure array) may indicate either cross- 
hybndizauon or crcss-amoixcation. depending on the overall hybrid.zation pattern as 
indicated in Fig. 3 However, these can be distinguished by the unique pattern 
observed. Further procedurss for data analysis are disclosed in U.S. Application 
03/"-;.376, previously incorporated for ail purposes. 

HuSNP and other marker-specific arrays have been designed and used in genetic 
studies'- 10 But the method developed in this study provides several advantages in 
dealing with many different genetic applications: (1) arrays based on a single generic 
design can be used to genotype different sets o f genetic markers because no specific 
customized genotyptng array is needed: ( 2) the pre-sclectcd probe sequences 
synthesized on the tag amy help ensure good hybridization results: |3> accurate 
quantitative measurement c f the allele frequency in the tested samples can be achieved. 
Thus, reliable genotype results can be obtained not only for individual samples, but aiso 
for pooled samples. Besides SBE, other assays can be coupled with tag array assay, for 
example, oligonucleotide ligation assay (OLA)" 2 ', invasive cleavage of oligonucleotide 
probes assay 22 , allele specinc PCR 23 . 

Our current tag chip contains over 32,000 unique tag probes. For most of the 
genetic application, for example, detecting mutations in one particular gene, it doesn't 
need such high-density chip. Therefore, smaller chips with fewer tags on the chip are 
sought after. .Alternatively, multiple tags corresponding to one particular marker can be 
' designed as to build the redundancy to the assay to assure accurate genotyping. Or 
multiple sets of tags for one set of SNPs can be designed, thus multiple samples can be 
processed and analyzed with one chip. Our current assay uses a two-color labeling 
scheme. But a four-color labeling/scanning system should warrant the assay can be done 

in a single tube reaction. 

For broader genetic applications, for example, a study needs to genotype 100s to 
1000s genetic markers, amplifying the genetic loci with multiplexing PCR is still the 
best strategy. However, to genotype 1000s to 10.000s markers, pre-amplification of the 
interested »cnciic ioci will be very labor-intensive and costly. A whole-genome 
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30 proa=h should be explored, for example, strategies involved using total' human 
,'enomieDNA directly, or genomic DNA amplified using some general amplification 
methods, e.g.. primer-extension preamplification. PEP 25 , or total cDNA. In tact. we have 
:ried tc use total human genomic DNA directly as the SBE template in our tag arrav 
assay. Zi out of the 3S of the markers that we tested gave good signals (data not shown). 
Nevertheless, large amount of work are warranted as to solve both the sensitivity ^signal 
intensity, and specificity ..mis-priming; problems before the whole-genome approach 

become really useful. 

The invention will be further illustrated by the following non-iimiting examples. 
The conient of references cued herein is incorporated herein by reference in its entirety. 

EXEMPLIFICATION 

METHODS 

Collection and Isolation of DNA From Samples 

DNA samples were collected by GenNet as part of the ongoing Family Blood 

5 Pressure Program. Samples were collected with consent and IRB approval in both 
Tecumseh, MI and Loyola. IL FAMILIES. Ascertainment was based on identification 
of a proband in the top 15'* (Tecumseh) or 20* (Loyola) percentile of the community's 
blood pressure distribution. Full phenorypic information was obtained for each 
individual. DNA was extracted from 5-10 ml of whole blood taken from each individual 

20 using the standard "salting-out" method (Gentra Systems). 

Primer Design 

For each SNP. primary PCR amplification pnmers were designed as descnbed 
previously 5 . The SBE pnmer was designed in a manner that its 3' terminates one base 
before the polymorphic site. Primer 3.0 software package 
2; (http:/Avww-gcnome.wi.mit.eduycgi-bin/primer/primer3.cgi) was modified and used tc 
pick SBE pnmers with batch sequences, at a predicted length of 20 (ranging from i S (o 
26) nucleotide and melting temperature of 60°C (ranging from 5<i°C to 64°0. The SBE 
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pnmers were always picked rrom the forward direction first .i.e. 5' to the polymorphic, 
site,. If the SBE or.mer can': be picxec item the rcrwara direction, reverse direction is 



•r.ea. 



Multiplexing PCR 

5 Specific genomic regions containing the i 44 SNPs were amplified with 9 

multiplex PCR reactions, each contains 50 ng of human genomic DNA. 0.1 jiM of each 
primer. ! mM deoxynucleotide iriphosphates idNTPs). 10 rruM Tns-MCl.(pH 8.3), 50 
mM KC1. 5 mM MgCUand 2 units of AmpliTaa Gold fPerkin Elmer) in a total value of 
25 PCR was performed on a Thermo Cycler <MJ Research), with initial denaturauon 

10 of the DNA templates and Taq enzyme activation at 96 'C for 10 minutes: followed by 
40 cycles of denaturauon at 94'C for 30 seconds. 57'C for 40 seconds, and 72'C for I 
minute and 30 seconds; and the final extension at 72 *C for iO minutes. 

SBE Template Preparation 

I )il of Exonuclease I (Amersham Life Science. 10 U/\ll) and 1 |J.l of Shrimp 
1 5 Alkaline Phosphatase (Amersham Life Science. \ \3I\L\) were added to a 25 ^ll PCR 
products (see above), and incubated at 37°C for 1 hour. The enzyme activities were 
inactivated at 1 00°C for 15 minutes. The enzymatically treated samples were applied to 
a S-300 column (Pharmacia), as to further reduce the residual PCR pnmers and dNTPs. 
and replace the buffer with ddH.O. 



20 Multiplexing SBE Reaction 

SBE is carried out in a 33 ^1 reaction, using 6. ^1 of the template (see above), 
nM of each SBE pnmer. 2.5 units of Thermo sequenase (.Amersham), 52 mM Tns-H 
(pH 9.5), 6.5 mM MgCL 25 liM of iiuoresceir.-N6-ddNTPs (NEW, 7.5 p.M 
biotin-N6-ddUTP or biotion-N6-dCTP. or 3.75 p.M biotin-N6-ddATP. and 10 |lM tr 

25 other coid ddNTPs. 
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Extension reaction was earned out on a i hermo Cycler ( MJ Research), with 1 
cvcie cf 96 = C for 3 minutes. :hen 45 cycies of 94 C C for 20 seconds and 58°C for i 1 
seconds. 

After 53E reaction, 9 reactions from each sample were combined and mixed 
5 with 30 ui of 100 ug/ml glycogen (Boehnnger Mannheim). 18.75 |4l of 8 M LiCl 
(Sigma), and 1 12: ul of pre-chillcd f-20°C) ethanoi (Abs.), and precipitated by 
ccntriftigation at the top speed fEppendorf centrifuge 5415C) for 15 minutes at room 
temperature; precipitated samples were dried at 40 C C for 40 minutes and re-suspended 
:n33 \i\ ddH : 0. 

10 Tag Array Design and Hybridization 

' For each :ag sequence, two probes were synthesized on the array. One is exactly 
the designed tag sequence (referred to as a Perfect Match, or PM probe). The other one 
is identical except for a single base difference in a central position (referred to as a 
Mismatch, or MM probe). The mismatch probe services as an internal control for 

15 hybridization specificity. Over 32.000 20-mer tag probes (and their companions) were 
chosen 11 and fabricated on a 8 mm x 8mm size of array. Each probe (feature) occupies a 
30 microns x 30 microns area. The sets of arrays were synthesized together on a single 
. glass wafer on which 100 arrays were made. 

The labeied sample was denatured at 95°C - lOO'C for 10 minutes and snap 

20 cooled on ice for 2 - 5 minutes. The tag array was pre-hybndized with 6 X SSPE-T (0.9 
M NaCl. 60 nuM NaH ; P0 4l 6 mM EDTA (pH 7.4), 0.005% Triton X-100) - 0.5 mg/ml 
of BSA for a few minutes, then hybridized with 120 \il hybndization solution (as shown 
below) at 42°C for 2 hours on a rotisscrie, at= 40 RPM. Hybridization Solution consists 
of 3M TMACL iTetramethylammonium Chloride), 50 mM MES 

25 ((2-[N-Moiphotino]ethanesulfonic acid) Sodium Salt) ( pH 6.7), 0.01% of Tnton X-100. 
0.1 mg/ml of Herring Sperm DNA. 50 pM of fluorescein-labeled control oligo. 0.5 
mg/ml of BSA iSigma) and 29.4 ui labeled SBE products (see below) in a total of 120 
|il reaction. 
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' The chips were nnsed twice with IX SSPE-T for about 10 seconds at room 
temperature, then washed with IX SSPE-T for 15-20 minutes at 40 3 C cn a rotissene, 
at » RPM. And then wash the chip 10 times with 6X SSPE-T at 22 C C on a.iluidic 
station (FS400. Asymetrix). The chips were stained at room temperature with 120 \i\ 

5 siainins solution (2.2 u&'mi srrepiavidin R-phycoerythnn (Molecular Probes ». and 0.5 
mg/ml acetylated BSA. in 6 x SSPET) on a rotisserie for 15 minutes, a: = 40 RPM. 
After staining, the probe array was washed 10 times again with 6 x SSPET on the 
FS400 at 22 'C. The chips were scanned on a confocal scanner (Afrymetrix) with a 
resolution of 60-70 pixeis oer feature, and two filters (530-nm and 560-run, 

10 respectively). GeneChtp Software t AiTymecnx) is used to convert the ;mage files into 
digitized files for further data analysis. . 

Clustering .Analysis 

For a given marker (at a given tag probe position), the intensity of each of the 
two colors (fluorescein and phycoerythrin) was calculated as the intensity at the perfect 

1 5 match position (PM) minus that at the mis-match position (MM). Negative fluorescein 
or phycoerythrin intensity values are treated as if they were zero. The Phat values were 
computed as the ratio of the intensities (fluorescein/fluorescein + phycoerythrin). The 
Phat values were sorted, and the optimal set of ranges for AA. AB and BB genotypes 
given the hypothesis of 2 or 3 clusters was considered, subject to the following ruies: at 

20. most 4 points (outliers) may be excluded from the genotype ranges. For 2 groups, the 
total range Phat values must be at least 0.3. For 3 groups, the total range Phat values 
must be at least 0.5. Ranges must be separated by a gap of at least 0. 1 . The width ot a 
range may be at most 0.4. A score was then computed as: Score = 1 - 1 sum of range 
widths / total range) - (outliers * 0. 1 ). 

25 The set of ranges with the best score was found and used to caii genotypes. This 

score increases with narrow ranges, while decreases with the number of points- that are 
left out of any range. Therefore, it tends to be optimal when ail the phat values are 
contained within relatively small ranges. 
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ABI Sequencing to Determine Genotypes 

To independently confirm che genotypes called from the tag array assay, -.hree 
■ samples (904957000000. 90^896000000. and 904889000000) were sequenced using 
^el-elec:rophoresis based method. Samples were amplified for all sites with T7 and T3 
5 tagged primers, using standard PCR cycling conditions (2.5 |-Ll of 20 ng/Hl DNA. 0.375 
\x\ of 20 uM pnmer <X2). 1.5 ul of I0X PCR buffer. 0.9 [il 25mM Mr. 0.15 ul 
lOrruVl dNTPs, 0.25 fil iO L= ul Taq DNA Polymerase ( Sigma), brought up to 15 [li 
with ddK : 0 per tube). Some products were sequenced directly, while a Ml 3 nesting 
strategy was used due to the ciose proximity of the polymorphic base to the primer end. 
10 Samples from the initial amplification were dilutee 1 :50 with ddH : 0, and amplified with 
M13F-T7 (TGTAAAACGACGGCCAGTT.\-\TACGACTCACTAT;VGGGAGA; SEQ 

IDNO:9)andM13R-T3 

(AACAGCTATGACCATGAATTAACCCTCACTAAAGGGAGA; SEQ ID NO: 10) 
pnmers using standard PCR conditions. All PCR products were cleaned with 

15 Exonuclease I (Amersham 0. 15 |il of 10 U/[il per well) and Shnmp Alkaline 
Phosphatase (.Amersham, 0.30 \il of 1 per well) in a volume of 10 jil. Dye 
terminator sequencing using a M13R pnmer (AACAGCTATGACCATG; SEQ ID NO: 
1 1) or T7 primer (TAATACGACTCACTATAGGGAGA; SEQ ID NO: 1 2) on an 
. ABI377 (Perkin Elmer) usmg Big Dyes (Perkin Elmer) was performed to determine the 

20 genotype status for each SNT in all three individuals. Trace files were read with Edit 
View i.O (Perkin Elmer) sortware. 

. EXAMPLE 1 

DNA from a individual is isolated, and amplified with primers from 15 
previously-characterized (i.e.. known) SNPs. .Amplification is allowed to proceed as 
25 described in Hudson. t.J. et at. (Science 270: 1945-1954 (1995)) and Dietrich et cii 
(Dietrich. W. F. et a/.. Nature 380:149-152 ( 1996); Dietrich. W. F. eial. Nature 
Genetics 7:220-245: Dietrich. W. et aL Genetics 131:423-447 (1992)). For example, in 
a 50 ul reaction volume. 0.5 ng of template nucleic acid/targct polynucleotide is added 
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10 1 (J.M forward amplification primer. 1 |iM reverse amplification primer. 200 uM 
dGTP. 200 uM ciTTP. 200 jiM dATP. 3.5 mM MgCU. 1.0 rmM Tris-HCl (pH 8.3). 50 
mM KC".. C.02 U.M moiecular probe, and 0.25 units or polymerase enzyme. The 
reaction mixture can then be subjected to a two-step amplification process, performed 
on a Tetrad (MJ Research. Watenown. Massachusetts), with the conditions: 
denaturation at 94°C for 60 seconds, followed by an annealing; extension step at 53°- 
56 : C for one minute. The denaturation and annealing/extension steps are repeated for 
40 evetes. Alternatively, a three -step thermocycling reaction can be used, such. as 94°C 
for 60 seconds, followed by annealing at 53°-56=C for 30 seconds, followed by 
extension at 72°C for one minute the three steps being repeated for 40 cycles. This may 
be followed by an optional extension step at 72°C for five minutes. 

After amplification is complete, locus-specific tagged oligonucleotides specific 
for the 10 SNPs are added, and are allowed to hybridize to '.he amplification products. 

Reagents for a single base extension reaction are then added, where each of the 
four ddNTPs is iabeled with a different fluorophore. Single base extension is then 
performed. as described by Kobayashi et al. (Mol. Cell. Probes 9:175-182 (1995)). 

After the reaction is complete, the reaction products are placed in contact with 
the universal array, and the reaction products allowed to hybridize, each product to its 
appropriate oligonucleotide tag on the array. The chip is then assayed in a iluorometer, 
and the wavelength emitted at each address in the array is recorded. From this data, the 
genotype at each individual SNP is determined. 

EXAMPLE 2 

Two alleles oftemplate were mixed at ratios of 1:30. 1:10. 1:3. 1:1. 3:1. 10:1, 
and 30: 1 . These were labeled with different color labels by single-base extension 
reaction and hybridized to a tag array. A correlation was observed between the signal 
intensity ratio and the template concentration ratio over a 900-fold dynamic range. See 
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EXAMPLE 3 

A set of tug sequences :s selected such that the tags are iikely to have similar 
hvbridization characteristics and minimal cross-nybridizauoa to other tag sequences. 
An oligonucleotide array of all of the tags is fabricated. The design and use of such a 
5 4.000-20mer-tag array for the functional analysis of the yeast genome has been 

described (1). More recently, Affymetnx designed and fabricated an array with a set of 
more than 16.000 such tags. The tag sequence synthesized on the chip can be 20-mer, 
25-mer. or other lengths. 

EXAMPLE 4 

10 Marker specific primers are used to amp lip/ each genetic marker (e.g. SNP). A 

multiplex PCR strategy is used to amplify these markers from genomic DNAs of tested 
individuals (2). A iter PCR amplification, excess primers and dNTPs are removed 
enzymatically. These enzymatically treated PCR products then serve as templates in the 
next SBE reaction. Please note that these templates (PCR products) are double 

15 stranded, which are different from the templates used in other protocols (3, 4). For 
example, in Minisequencing (3) and Genetic Bit .Analysis (GBA, 4). a double stranded 
template has to be converted to a single stranded template prior to the base extension 
reaction. The methods used for this conversion are costiy. laborious, and hard to 
automate. 

20 EXAMPLES 

In the protocol described below, an SBE pnmer is designed for each genetic 
marker which terminates 1 base before the polymorphic site. However, other pnmer 
design schemes can be used. The pnmer for each marker is tailed with an unique tag 
which is complementary to a specific probe sequence synthesized on the tag chip. The 
25 extension reaction is multiplex. \n which SBE primers corresponding to multiple 

markers were added in a single reaction tube, and extended in the presence of pairs of 
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jdNTPs labeiec with different riucrophores. for an A/C variant, there might be a 
ad ATP -red and DDCTP-gresn. 

EXAMPLE 6 

The resui-ing mixture is nybridized to the tag array. Each tag corresponds :o a 
5 single marker. The ratio of :he intensities of the colors indicates the genotype tor the 
allele frequency, ranging from 0% to 100%) of the samples tested. 

' EXAMPLE 7 

SBE template preparation: Marker specific primers arc used to amplify each 
single nucleotide polymorphism (SNP). A multiplex ?CR strategy is used to amplify 
10 these SNPs (Science 2S0: 10"-10S2. 1998). 

Multiplex PCR: 

Multiplex PCR reaction is earned out with AmpliTaq Gold and 25 primer pairs 
in a 25ul reaction volume. SNPs with same base composition at the polymorphic site 
(i.e. A/G, T/C, etc) are pooled together. 

15 . PCR reagents: 

10XPCR Multiplex Buffer (II): 100 mM Tris/HCl fpH 3.3) 

500 mM KC1 

25 mM dNTPs 
20 F & R Primers i for each pnmer. the cone, is 1 uM ) 
20 ng/ul Genomic DNA 

Multiplex PCR reaction (25 ul) 

Primer Mix ( 1 uM eachi 2.5 ul 
Genomic DNA (20 ng/ul) 2.5 ul 
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10XPCR Buffer II 2-5 ul 

25 niM MgCK 5 ul 

25 miVI dNTPs ■ i ul 

AmpiiTaq Gold (5U/ul) 0.4 ul 

ctdH.O up to 25 ul 



PCR conditions 

96*C 1=0 min 

40 cvcies : 



94 X 30 sec 

57'C 40 sec 

72*C 1 min 30 sec 



72 a C 10 min - 

4*C O/N 

Enzymatic rreatment of PCR products' to degrade and de-phosphorylate the unused 
primers and dNTPs, respectively: 

To a 25 |-tl PCR products, add i Jil of Exonuciease I (Amersham Life Science, 
10 U/ui) and 1 fil of Shrimp Alkaline Phosphatase t Amersham Life Science, i L r/ jil>, 
and incubate at 37° C for 1 hour. Inactivate the enzyme activities at 100°C for 15 
minutes. Apply the sample to a S-3.00 column (Pharmacia), to further reduce the 
residual PCR primers and dNTPs, and replace the buffer with ddH:0. The sample is 
ready for next SBE reaction. 
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Singie Base Extension (SBE): 

.Ail SBE primer is designed for each S\? which -ermmatss ! base before the 
polymorphic site. The pnmer for each SNP is niied with a unique tag which is 
complementary to a specific probe sequence on the tag chic. The SBE reaction is also 
5 multiplexed at 25-piex. 



Reaction Mixture (33 ul): 

• Template (see above) 6 jil 

SBE Pnmer mix (20 nM for each Drimer) 2.5 ill 
5X Thermo Sequenase buffer 6.6 ul 

10 Bio-id)dNTP(X timolol*. NEN) 0.5 [il 

F!u-ddNTP(lnmoL')il, NEN) 0.3 [l\ 
■ Other two cold -ddNTPs(lQmol/jil. Biopharmacia) 0.3 |il each 

Thermo Sequenase(6.4 U/|il) 0.4 (ii 
(Amersham) 

15 ddfi.0 up to 33^1 



* X= 0.5 when it is Bio-ddUTP or bio-dCTPf0.5 mM), or X= 0.25 when it is 
bio-ddATP (0.25 mM) 

PCR program: 
96°C 3 1 I cycle 
20 94°C 25 M 

58°C 11" 45 cycles 

- 3 C forever 
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Prec:pitation: 

After 53E reaction, we combined 9 tubes for each sample, mix with 50 \i\ of 
LOO ae/mi sivccsen i-Boehnnger Mannheim), then precipitated with IS. 75 \x I of S VI 
* LiCL ind ill: nl or pre-chilled (-20°C) ethanol ( Abs.). Mix well; then centrifuge at the 
5 top speed (Epoendorf centrifuge 541 5C) for 15 min at room temperature; Decant the 
supernatant, and dry the samples at 40C for 40 mm, re-suspend the samples in 33 (J.1 
ddHZO. now it is ready for hybridization. 

Hybridization: 

The prepared sampie is denatured at 100'C for 10 minutes and snap cooled on 
1 0 ice for 2-5 minutes. The universal tag chip is pre-hybridized with 6 X SSPE-T (0.9 M 
NaCL 60 miM NaH,PO., 6 mM EDTA (pH 7.4), 0.005% Triton X-100) + 0.5mg/mi of 
BSA. then hybridized with 120 |il hybridization solution (as shown below) at 42°C 2 
hours on a rotissene, at= 40 RPM. 

The hybridization solution contains: 



15 5MTMACL < 72^1 

0.5M MES (pH 6.7) 12 \i\ 

1% Triton X-100 1-2 |JLl 

KS DNA (iOmg/ml) 1.2 Jil 

Flu-c2l3(5nM) 1-2 ^1 

20 BSA(20m^-mi) 3.0 ul 



■ Plus 29.4 [il prepared sample (see above). 

Post-Hybridization Wash: 

Rinse the chip with IX SSPE-T 10" twice first, then wash with IX SSPE-T for 
25 i 5-20min at 4()°C on a rotissene, at = 40 RPM And then wash on a fluidic station 
(FS400. Asymetrix i 10 times with 6 x SSPET at 22°C. 
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Staining: ' 

Stain the chip at room temperature with 120 ul staining solution ^2.2 [lg/ml 
streptavidin R-phycoerythrin uVIoiecuiar Probes), and 0.5 mg/ml acetylated 8SA. in 6 x 
SSPET) on a rotissene for I: minutes, at- 40 RPM. Alter staining, the probe array was 
5 washed 1 0 times again with 6 x SSPE-T on the FS400 at 22°C. 

Scanning: 

The chips were scanned on a conibcai scanner ( Affymetnxi with a resolution of 
60-70 pixels per feature, and two niters (530-nm and 560-nm. respectively). GeneChip 
Software (Affymetrix) ;s used to convert the image files into digitized files for further 
10 data analysis. 

EXAMPLE 7 

Genotyping With High-Density Oligonucleotide "Tag" Arrays 

A genotyping method based on the use of a high-density "tag" array that 
contains over 32,000 pre-selected 20-mer oligonucleotide probes, combined with 

15 marker-specific PCR amplifications and single base extension (SBE) 1 " 2 reactions has 
been developed. We have used this method to genotype a collection of 144 
single-nucleotide polymorphism (SNPs) identified from 49 hypertension candidate 
genes 3 . First, marker-specific primers were used in multiplex PCR reactions to amplify 
specific genomic regions containing the SNPs. The PCR amplified DNA products were 

20 then used as templates in SBE reactions. Each SBE primer comprises a 3' portion and a 
5 1 portion. The 3* portion is complementary to the specific SNP locus and terminates 
one base before the polymorphic site. The 5' portion comprises a unique sequence, 
which is complementary to a specific oligonucleotide probe synthesized on the "tag" 
array. The extension reaction is multiplex, with SBE primers corresponding to multiple 

25 SNPs in a singie reaction tube. The pnmers are extended m the presence of two-color 
labeled ddNTPs. and the resulting mixture is hybridized to the tag array. The intensity 
ratio of the two colors was used to deduce tne genotypes of the samples tested. 
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The tag array strategy begins with an array of tag sequences selected in a manner 
that all tag probes are in the same length, e.g. 20-nucieotide long, with similar melting 
temperature and G-C content, and the lowest sequence homologous among each other 11 . 
Therefore, these tags are likely to have similar hybridization characteristics and minimal 
5 cross-hybridization to other tag sequences. 

The design and use of a 4,000-tag array for the functional analysis of yeast 
Saccharomyces cerevisiae genes 11 and drug sensitivity studies 12 have been described. 
More recently, we have designed and fabricated an array that contains more than 32,000 
such tags, and developed it as a genotyping tooi. in combination with marker-specific 

10 PCR amplifications and SBE reactions. 

As shown in Fig. 7, marker specific primers are designed and used to amplify 
each single nucleotide polymorphism (SNP). A multiplex PCR strategy is used to 
amplify these SNPs rrom genomic DNAs 9 . In general. SNPs with same base 
composition at the polymorphic site (e.g. all the A/G polymorphisms) are grouped 

15 together. After PCR amplification, excess primers and dNTPs are degraded and 

de-phosphorylated using Exonuclease I and Shrimp Alkaline Phosphatase, respectively. 
These enzymatically treated PCR products (double-stranded) are then served as 
templates in the SBE reaction. A SBE pnmer is designed for each genetic marker, 
• which terminates one base before the polymorphic site. Each pnmer is tailed with a 

20 unique tag that is complementary to a specific probe sequence synthesized on the tag 
array. The extension reaction is multiplex, in which SBE primers corresponding to 
multiple markers (up to 56 markers that we have tested so far) were added in a single 
reaction tube, and extended in the presence of pairs of ddNTPs labeled with different 
fluorophores. e.g. for an A/G variant, biotin-labeled ddATP and iluorescein-labeled 

25 ddGTP are used. The resulting mixture of SBE reactions is hybridized to the tag array. 
Each tag hybridizes to a specific probe position on the chip. The ratio of the intensities 
of the coiors indicates the genotype (homozygous wild type, or homozygous mutant, or 
heterozygous! or the allele frequency (ranging from 0% to 100%) in the samples tested. 
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In a comparison of the results of using singie-stranded and double-stranded PCR 
products as the Templates in the current SBE/tag array assay, no significant difference 
was found (data not shown). However in previously published protocols of 
minisequencing' 3 * 15 and genetic bit analysis 16 "*, a double-stranded template has to be 
5 convened to a singie-stranded template prior to the base extension reaction. The 
methods used for this conversion were costly, laborious, and hard to automate. 

The tag array assay provides a fairly accurate quantitative measurement of the 
allele frequency in samples tested. As shown in Figure 2. we have synthesized two 
artificial SBE templates. They are identical, except the 21* 1 position: T in template-T. 

iO and G in :empiate-G. We then mixed the two templates at ratios of 1:10. 1:3, 1:1.3:1, 
iC: 1. and 30: i. which is a 300-fold dynamic range. Six SBE primers, which have the 
same 3' portion (the portion complementing to the template sequenceVbut different 5' 
portion ( the portion complementing to the tag probes on the tag arrays) were designed 
(Fieure 2), and extended in the presence of the SBE templates mixed at different ratios, 

15 and biotin-labeled ddATP and fluorescein-labeled ddCTP. As shown in Fig. 8, the 
intensity ratio of the two colors and the template concentration ratio (i.e. the allele 
frequency) appears to form a fairly good linear correlation in the 300-fold dynamic 
range that we tested. 

To further test the robustness and the efficiency of the tag array/SBE assay 

20 method for genocypmg application, we set out to type a portion of the SNPs that we had 
identified from a large-scale polymorphism screening study with the hypertension 
candidate genes 3 . Initially, we selected 173 SNPs from 56 hypertension candidate 
^enes. These SNPs were chosen for their being occurred in promoter regions, or splicing 
junctions, or coding regions in which the nucleotide changes caused amino acid 

25 changes. We reason that these SNPs can be the good candidates for being the functional 
mutations predisposed to the disease. Therefore, the assay developed in this study could 
then be used in large-scaic association studies in hypertension. PCR pnmers were 
designed and tested individually for these 173 SNPs. S of them (4.6%) failed 10 amplify. 
SBE pnmers were then designed for the remaining 165 SNPs. A multiplexing PCR and 
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multiplexing SBE assay was developed with a complexity of 9 to 23 markers in each 
reaction and a total of 9 reactions for the 165 markers. 21 of them (12.7%) failed in the 
multiplexing PCR and multiplexing SBE assay. Therefore, 144 markers from 49 genes 
passed the assay development. The gene location, polymorphic sites, and the designed 
5 primers for these 144 markers were summarized in Table 1. 

We then genoryped 44 individuals using 44 cag arrays. Good hybridization 
signals were obtained in 96.5% (61 16/ 6336 (144 x 44)) of the cases. The signai 
intensity values from the hybridization results were used in clustering analysis for each 
of the 144 markers. Genotypes for each individual at the 144 loci were assigned 
10 automatically based on - he clustering results, with some manual editing. Data Desk 6.0 
(Data description. Inc.) was used to manually display the clustering analysis results (of 
the intensity ratios of the two colors). Overall, 80-85% of the markers form good 
cluster(s). 

We have performed the gei-based DNA sequencing to determine the genotypes 
15 at 1 15 loci in 3 of the 44 individuals (see Methods). Comparison of the ABI sequencing 
results and the chip results resulted in 14 discrepancies (4%), out of 1 15 x 3 = 345 
genotype calls. Most of the discrepancies occurred in cases where one method called 
homozygous, while the other method called heterozygous. In one case (marker 
ICAMlex6.254), where the ABI sequencing method called G/G, but the tag array /SBE 
20 . assay method called A/ A in all the three individuals, we believe the discrepancies are 
due to mis-priming of the SBE primer to adjacent sequences. 

We also tested the reproducibility of the tag array/SBE assay genotyping 
method. We repeated the multiplexing PCR, SBE and the chip hybridization 
experiments in 4 individuals. The ratios of the two colors (for each of the 144 markers) 
25 in the replicated experiments are not all exactly the same, hut they all fall into the same 
cluster (i.e. giving the same genotype call). Therefore, we didn't find any discrepancy in 
the genotyping call of duplicated samples. 
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While this invention has been particularly shown and described with references to 
preferred'embodimems thereof, it will be understood by those skilled in the an that 
various changes in form and details may he made tnerem without departing from the 
scope of the invention encompassed by ;he appended claims. 
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CLAIMS 

What :5 ci aimed is: 

An oligonucleotide array comprising one or more oligonucleotide tags hxed to a 
solid substrain wherein each oligonucleotide tag comprises a unique mown 
5 arbitrary nucleotide sequence of sufficient length to hybridize to a locus-specific 

tagged oligonucleotide, wherein :he locus-speciric tagged oligonucleotide has at 
us first end nucleotide sequence which hybridizes to, e.g.. is complementary to. 
the arbitrary seauence of :he oligonucleotide tag. 

2. A kit comprising: 

10 ■ (a) an array comprising one or more oligonucleotide tags fixed to a soho 

substrate, wherein each oligonucleotide tag comprises a unique known 
arbitrary nucleotide sequence of sufficient length to hybridize to a locus- 
specific tagged oligonucleotide; and 
(b) one or more iocus-specific tagged oligonucleotides, wherein each locus- 

1 5 specific tagged oligonucleotide has at its first (5') end nucleotide 

sequence which hybridizes to. e.g.. is complementary to, the arbitrary 
sequence of a corresponding oligonucleotide tag on the array, and has at 
it's second (3') end nucleotide sequence comolementary to target 
polynucleotide sequence in a sample. 

20 '3, A method of genotyping a nucleic acid sample at one or more loci, comprising 
the steps of: 

(a) obtaining a nucleic acid sample to be tested; 

(b) combining the nucleic acid sample with one or more tocus-specmc 
taesed oligonucleotides under conditions suitable for hybridization ot the 

25 nucleic acid sample to one or more locus-specific tagged 
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oligonucleotides, wherein each iocus-spccific tagged oligonucleotide 
comprises a nucleotide sequence capable of hybridizing to a 
complementary seuuence in an oligonucleotide tag and a nucleotide 
sequence complementary to the nucleotide sequence 5' of a nucleotide to 
be quened in the sample, thereby creating an amplification product- 
iocus-specific tagged oligonucleotide complex: 

(c) subjecting the complex to a single base extension reaction, wherein the 
reaction results in the addition of a labeled ddNTP to the locus-specific 
taeged oligonucleotide, and wherein each type of ddNTP has a label that 
can be distinguished from the label of the other three types of ddNTPs: 

id') contacting the complex with an oligonucleotide array comprising one cr 
more oligonucleotide tags fixed to a solid substrate under suitable 
hybridization conditions, wherein each oligonucleotide tag comprises a 
unique arbitrary sequence complementary and of sufficient length to 
hybridize to a complementarysequence in a locus-specific tagged 
oligonucleotide, whereby the complex hybndizes to a specific 
oligonucleotide tag on the array; and assaying the array to determine the 
labeled ddNTPs present in the complex hybridized to one or more 
oligonucleotide tags, 

thereby determining the genotype of the queried nucleotide in the sample. 

4. A method to aid in determining a ratio of alleles at a polymorphic locus in a 
sample, comprising the steps of: 

(a) using a pair of primers to amplify a region of a nucleic acid in a sample, 
wherein the region comprises a polymorphic locus, whereby an amplified 
DNA product is formed: 

(b) labeling an extension primer by a single base extension reaction to rorm 
a labeled extension primer, wherein the amplified DNA product is used 
as a template, wherein the extension pnmer comprises a 3* portion and a 
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5* portion, wherein. the 5 ! pcnion is complementary to the amplified 
DMA product and terminates one nuciectide 5' -o the polymorphic iocus. 
wherein the 5' pomon is no: complementary to the amplified DNA 
product, whereby a labeled dideoxynucieotide which is complementary 
to the polymorphic locus is coupled to the 3' end of the extension primer, 
wherein each type of dideoxynucieotide present in the reaction bears a 
distinct labei; and 

[c) hybridizing the 5' portion of :he extension pnmer to one or more probes 
complementary :o the 5' portion which are immobilized to known 
locations on a soiid support. 

5. The method of claim 4 wherein two complementary strands of the amplified 
DNA product are present in the single base extension reaction. 

6. The method of claim 4 wherein two complementary strands of the amp li fed 
DNA product are used as templates in the step of labeling. 

T . The method of claim 4 wherein the label is a rluorescent label. 

8. The method of claim 4 wherein the iabel is a radiolabei. 

9. The method of claim 4 wherein the iabel is an enzyme label. 

10. The method of claim 4 wherein the iabel is an antigenic label. 

1 1. The method of claim 4 wherein the label is an affinity binding partner. 

12. The method of claim 4 further comprising the step of: 

(d) optically detecting a fluorescent label on the solid support. 
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13. - The method of claim 4 wherein the step of labeling employs at least two "distinct 

dideoxynucleotides beanng distinct labeis. 

14. The method of claim 4 wherein the step of labeling employs four distinct 
dideoxynucieotides beanng distinct labels. 

15. The method cf claim 4 further comprising :he steps of: 

(d) comparing quantities of a first and a second label at a location on the 
solid support; and 

it) determining the ratio of nucleotides present at the polymorphic iccus in 
■he sample. 

16. The method of claim 15 wherein the ratio of nucleotides present at two or more 
polymorphic loci is determined simultaneously. 

1 7. The method of claim 4 wherein the sample comprises DNA from two or more 
individuals. 

18. The method of claim 17 wherein the ratio of nucleotides present aL two or more 
polymorphic loci is determined simultaneously. 

19. The method of claim 4 wherein the solid support is selected from the group 
consisting of beads, microliter plates, and oligonucleotide arrays. 

. 20. . A set of primers for use in determining a ratio of nucleotides present at a 
polymorphic iocus, comprising: 

(a) a pair of primers which when in the presence of a DNA polymerase 

amplify a region of double stranded DNA. wherein the region comprises 
a polymorphic locus; and 
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l b) tin extension primer which comprises a 3' ponion which is 

complementary to a portion of the region of double stranded DNA and a 
y oonion which is not complementary to the region of double stranded 
DNA. wherein the extension pnmer terminates one nucieotide 5' to the 
. polymorphic locus. 

A kit comprising :n a single container two or more of the sets of primers of 
claim 20. 



comprising m a singie container: 
a sec of primers of ciaim 20; and 

a solid support comprising a probe which is attached to a solid support, 
wherein the probe is complementary to the 5' portion of the extension 
primer. 

The kit of claim 22 wherein the solid support is an oligonucleotide array. 

The kit of claim 22 wherein the solid support is a bead. 

. The kit of claim 22 wherein the soiid support is a microliter plate. 

A method to aid in determining a ratio of alleles at a polymorphic locus in a 
sampie, comprising the steps of: 

(a) labeling an extension pnmer by a single base extension reaction to form 
a labeled extension primer, using a DNA moiecule as a template, 
wherein the extension pnmer compnses a 3' portion and a 5' ponion, 
wherein the 3* ponion is complementary to the DNA molecule and 
terminates one nucleotide 5' to a polymorphic locus, wherein the 5' 
ponion is not complementary to the DNA molecule, whereby a labeled 



A kit 

(a) 

(b) 
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didecxynucleotide which is complementary to the polymorphic iccus is 
coupied to the 5' end of the extension pnmer. wherein each type of 
dideoxynucieouae present :n the reaction bears a distinct label: and 
ib ) hybridizing the 5' portion of the extension pnmer to one or more probes 
complementary to the 5' portion which are immobilized to known 
locations on a soiid support. 

The method of claim 25 wherein two complementary strands of the DNA 
molecule are present in :he single base extension reaction.. 

The method of claim 2' wherein each complementary strand of the DNA 
moiecule is used as a template to label an extension primer. 

The method of claim 26 wherein the label is a fluorescent label. 

The method of claim 26 wherein the label is a radrolabel. 

The method of claim 26 wherein the label is an enzyme label. 

The method of claim 26 wherein the label is an antigenic label. 

The method of claim 26 wherein the label is an affinity binding partner. 

The method of claim 26 further comprising the step of: 

(c) optically detecting a fluorescent label on the solid support. 

The method of claim 26 farther comprising the steps of: 
(c) comparing quantities of a first and a second label at a location on the 
solid support: and 
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(d) determining the ratio of nucleotides presen: at the polymorphic locus :n 
■:he sample. 

36 The method of claim 35 wherein the ratio of nucleotides present at r.vo or more 
polymorphic loci is determined simultaneously. 

The method of claim 26 wherein the sample comprises DNA from two or more 
individuals. 

S. The method of claim 54 wherein :he ratio of nucleotides present ~t rwc or more 
polymorphic loci is determined simultaneously. 

9. The method of claim 26 wherein the step of labeling employs at least two 
distinct dideoxynucleotides bearins distinct labels. 



The method of claim 26 wherein the step of labeling employs four distinct 
dideoxynucleotides bearing distinct labels. 
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Locus-specific lagged oligonucleotide: 



synthetic sequence specific 
for a particular oligonucleotide 
tag on the array 

(eg., "Tag A," "Tag 3." Tag C," 
etc.) 



sequence specific for the . 
amplification product of a 
particular SNP (e.g., SNP " v 
SNP "B," SNP "C," etc.) 



Fig. 2 
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Amplification product 

<---ATGCTATC0XXXX---> 
TACGTAff, 



-ocus-specirk tagged oligonucleotide 

Fig. 3 
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Amplification product 
< A T G C T A T C A X X X X — > 



! T A C G T A C G T 

Locus-specinc ragged oligonucleotide 

ddNTPs: T A 



i i 1 I 



Fig. 4 
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Fig. 5 



SUBSTITUTE SHEET (RULE 26) 



WO 00/58516 



PCT/US00/08069 



!! 

! ! 



| (Address C) 



6/9 



KAddress A) 



! I ! i ! 



J ((Address A) 



I ! I i I 



; [(Address B) 



I M ! I 



! '(Address B) 

i i 



-LI 



| (Address C) 



Mill 



Fig. 6 



SUBSTITUTE SHEET (RULE 26) 



WO 00/58516 



PCT/USOO/08069 




Fig: 7 
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